Rank in Wordlist | Word | Rank in Wordlist | Word |
---|---|---|---|
1 | on | 26 | v�lja |
2 | ja | 27 | veel |
3 | et | 28 | aasta |
4 | ei | 29 | juba |
5 | ka | 30 | See |
6 | kui | 31 | tuleb |
7 | see | 32 | v�ga |
8 | v�i | 33 | kuid |
9 | mis | 34 | vastu |
10 | ning | 35 | me |
11 | oma | 36 | vaid |
12 | oli | 37 | mitte |
13 | Eesti | 38 | Ma |
14 | siis | 39 | mida |
15 | seda | 40 | sest |
16 | ole | 41 | nende |
17 | aga | 42 | oleks |
18 | selle | 43 | kohta |
19 | ta | 44 | �le |
20 | kes | 45 | kas |
21 | nii | 46 | kus |
22 | Kui | 47 | krooni |
23 | ma | 48 | nagu |
24 | pole | 49 | Riigikogu |
25 | v�ib | 50 | eest |
The table shows the top-50 words of the corpus. Usually we see stopwords.
Language: Afrikaans
This list is a good candidate for a first stopword list for a language.
Usually a small, balanced corpus is enough to get a good list of high frequent words. But if the small corpus has some very prominent topic, this will be visible even in the top word lists.
select w_id-100 as rank_in_wordlist, word from words where w_id>100 order by w_id limit 50;
3.4 Sample words for different frequency ranges